AITopics | differentiable meta-learning

Collaborating Authors

differentiable meta-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differentiable Meta-Learning of Bandit Policies

Neural Information Processing SystemsDec-23-2025, 19:22:27 GMT

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form. To do this, we parameterize our policies in a differentiable way and optimize them by policy gradients, an approach that is pleasantly general and easy to implement. We derive effective gradient estimators and propose novel variance reduction techniques. We also analyze and experiment with various bandit policy classes, including neural networks and a novel softmax policy. The latter has regret guarantees and is a natural starting point for our optimization. Our experiments show the versatility of our approach. We also observe that neural network policies can learn implicit biases expressed only through the sampled instances.

bandit policy, differentiable meta-learning, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Review for NeurIPS paper: Differentiable Meta-Learning of Bandit Policies

Neural Information Processing SystemsJan-22-2025, 01:19:22 GMT

As in standard policy-gradient methods, it seems that two key parameters are the batch-size m and the horizon n. It would be good to provide some sensitivity analysis on these parameters to better assess how the approach scales to complex problems. In particular, what is the effect of the horizon on the gradient estimation? Does the variance blow up or is the baseline sufficient to keep it under control? In this sense, it might be good to have differentiable strategies that are provably efficient (e.g., with sub-linear regret) for a range of parameter values, so that whather value of \theta we encounter during its optimization will not performed poorly.

bandit policy, differentiable meta-learning, neurips paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Differentiable Meta-Learning of Bandit Policies

Neural Information Processing SystemsJan-22-2025, 01:19:15 GMT

The rebuttal helped clarify the questions raised in the review. The consensus reached in the discussion is that this is a borderline-plus paper. The reviewers appreciate the contribution's practicality, relevance and usefulness, and at the same time they do remain concerned about the narrow scope, and would rather have seen the policy-gradient method applied to parameterized policies for more complex learning problems. On the whole, this is a worthwhile addition to the program. The rebuttal did not answer one question successfully, namely regarding the setup in the experiments section, where the learning process operating at two-levels remained confusing.

bandit policy, differentiable meta-learning, neurips paper, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Differentiable Meta-Learning of Bandit Policies

Neural Information Processing SystemsOct-9-2024, 14:53:45 GMT

bandit policy, differentiable meta-learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

Add feedback